146 research outputs found
Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization
Data-driven algorithm design, that is, choosing the best algorithm for a
specific application, is a crucial problem in modern data science.
Practitioners often optimize over a parameterized algorithm family, tuning
parameters based on problems from their domain. These procedures have
historically come with no guarantees, though a recent line of work studies
algorithm selection from a theoretical perspective. We advance the foundations
of this field in several directions: we analyze online algorithm selection,
where problems arrive one-by-one and the goal is to minimize regret, and
private algorithm selection, where the goal is to find good parameters over a
set of problems without revealing sensitive information contained therein. We
study important algorithm families, including SDP-rounding schemes for problems
formulated as integer quadratic programs, and greedy techniques for canonical
subset selection problems. In these cases, the algorithm's performance is a
volatile and piecewise Lipschitz function of its parameters, since tweaking the
parameters can completely change the algorithm's behavior. We give a sufficient
and general condition, dispersion, defining a family of piecewise Lipschitz
functions that can be optimized online and privately, which includes the
functions measuring the performance of the algorithms we study. Intuitively, a
set of piecewise Lipschitz functions is dispersed if no small region contains
many of the functions' discontinuities. We present general techniques for
online and private optimization of the sum of dispersed piecewise Lipschitz
functions. We improve over the best-known regret bounds for a variety of
problems, prove regret bounds for problems not previously studied, and give
matching lower bounds. We also give matching upper and lower bounds on the
utility loss due to privacy. Moreover, we uncover dispersion in auction design
and pricing problems
Subset-Based Instance Optimality in Private Estimation
We propose a new definition of instance optimality for differentially private
estimation algorithms. Our definition requires an optimal algorithm to compete,
simultaneously for every dataset , with the best private benchmark algorithm
that (a) knows in advance and (b) is evaluated by its worst-case
performance on large subsets of . That is, the benchmark algorithm need not
perform well when potentially extreme points are added to ; it only has to
handle the removal of a small number of real data points that already exist.
This makes our benchmark significantly stronger than those proposed in prior
work. We nevertheless show, for real-valued datasets, how to construct private
algorithms that achieve our notion of instance optimality when estimating a
broad class of dataset properties, including means, quantiles, and
-norm minimizers. For means in particular, we provide a detailed
analysis and show that our algorithm simultaneously matches or exceeds the
asymptotic performance of existing algorithms under a range of distributional
assumptions
- β¦